Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Make PartitionedOutput reclaimable #11856

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tanjialiang
Copy link
Contributor

@tanjialiang tanjialiang commented Dec 13, 2024

Partitioned output spill work as following:
As soon as partitioned output is triggered to reclaim from itself, this task's output buffer switches to spill mode, meaning:

  • Any time the output buffer reaches its max in memory buffer bytes, it spills the data to disk, instead of waiting for consumers to retrieve.
  • Any consumer retrieving will not receive any data, until all current task's results have been produced and spilled.
  • When all current task's results have been produced and spilled, consumer will retrieve the unspilled data directly from disk.

Partitioned output spill currently only works for PARTITION mode. This PR has limitations of not efficiently save memory by spilling, due to unspilling using a considerably large amount of memory. But it lays a foundation for persistent shuffle. Future changes are coming, to make this spill more coordinated with other plan nodes' spill so that node memory usage is more efficient.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 13, 2024
Copy link

netlify bot commented Dec 13, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 1a0127e
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6786befefda59e0008e98819

@@ -108,43 +108,8 @@ velox::memory::MemoryPool* DriverCtx::addOperatorPool(

std::optional<common::SpillConfig> DriverCtx::makeSpillConfig(
int32_t operatorId) const {
const auto& queryConfig = task->queryCtx()->queryConfig();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to keep it here and lazy initialize the spill component of outputbuffer in partitioned output operator.

@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@tanjialiang tanjialiang force-pushed the po_spill branch 2 times, most recently from 48fdbfc to a0672ae Compare December 30, 2024 07:57
@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@tanjialiang tanjialiang force-pushed the po_spill branch 4 times, most recently from 38d3924 to 65d5c88 Compare January 14, 2025 06:53
@tanjialiang tanjialiang requested a review from xiaoxmeng January 14, 2025 07:03
@tanjialiang tanjialiang marked this pull request as ready for review January 14, 2025 07:03
@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants